21 research outputs found

    Sparse gaussian processes for large-scale machine learning

    Get PDF
    Gaussian Processes (GPs) are non-parametric, Bayesian models able to achieve state-of-the-art performance in supervised learning tasks such as non-linear regression and classification, thus being used as building blocks for more sophisticated machine learning applications. GPs also enjoy a number of other desirable properties: They are virtually overfitting-free, have sound and convenient model selection procedures, and provide so-called “error bars”, i.e., estimations of their predictions’ uncertainty. Unfortunately, full GPs cannot be directly applied to real-world, large-scale data sets due to their high computational cost. For n data samples, training a GP requires O(n3) computation time, which renders modern desktop computers unable to handle databases with more than a few thousand instances. Several sparse approximations that scale linearly with the number of data samples have been recently proposed, with the Sparse Pseudo-inputs GP (SPGP) representing the current state of the art. Sparse GP approximations can be used to deal with large databases, but, of course, do not usually achieve the performance of full GPs. In this thesis we present several novel sparse GP models that compare favorably with SPGP, both in terms of predictive performance and error bar quality. Our models converge to the full GP under some conditions, but our goal is not so much to faithfully approximate full GPs as it is to develop useful models that provide high-quality probabilistic predictions. By doing so, even full GPs are occasionally outperformed. We provide two broad classes of models: Marginalized Networks (MNs) and Inter- Domain GPs (IDGPs). MNs can be seen as models that lie in between classical Neural Networks (NNs) and full GPs, trying to combine the advantages of both. Though trained differently, when used for prediction they retain the structure of classical NNs, so they can be interpreted as a novel way to train a classical NN, while adding the benefit of input-dependent error bars and overfitting resistance. IDGPs generalize SPGP by allowing the “pseudo-inputs” to lie in a different domain, thus adding extra flexibility and performance. Furthermore, they provide a convenient probabilistic framework in which previous sparse methods can be more easily understood. All the proposed algorithms are tested and compared with the current state of the art on several standard, large-scale data sets with different properties Their strengths and weaknesses are also discussed and compared, so that it is easier to select the best suited candidate for each potential application.Los procesos Gaussianos (Gaussian Processes, GPs) son modelos Bayesianos noparamétricos que representan el actual estado del arte en tareas de aprendizaje supervisado tales como regresión y clasificación. Por este motivo, son uno de los bloques básicos usados en la construcción de otros algoritmos de aprendizaje máquina más sofisticados. Asimismo, los GPs tienen una variedad de propiedades muy deseables: Son prácticamente inmunes al sobreajuste, disponen de mecanismos sensatos y cómodos para la selección de modelo y proporcionan las llamadas "barras de error", es decir, son capaces de estimar la incertidumbre de sus propias predicciones. Desafortunadamente, los GPs completos no pueden aplicarse directamente a bases de datos de gran tamaño, cada vez más fecuentes en la actualidad. Para n muestras, el tiempo de cómputo necesario para entrenar un GP escala como O(n3), lo que hace que un ordenador doméstico actual sea incapaz de manejar conjuntos de datos con más de unos pocos miles de muestras. Para solventar este problema se han propuesto recientemente varias aproximaciones "dispersas", que escalan linealmente con el número de muestras. De entre éstas, el método conocido como "procesos Gaussianos dispersos usando pseudo-entradas"(Sparse Pseudo-inputs GP, SPGP), representa el actual estado del arte. Aunque este tipo de aproximaciones dispersas permiten tratar bases de datos mucho mayores, obviamente no alcanzan el rendimiento de los GPs completos. En esta tesis se introducen varios modelos de GP disperso que presentan un rendimiento mayor que el del SPGP, tanto en cuanto a capacidad predictiva como a calidad de las barras de error. Los modelos propuestos convergen al GP completo que aproximan bajo determinadas condiciones, pero el objetivo de esta tesis no es tanto aproximar fielmente el GP completo original como proporcionar modelos prácticos de alta capacidad predictiva. Tanto es así que, en ocasiones, los nuevos modelos llegan a batir al GP completo que los inspira. Se proporcionan dos clases generales de modelos: Redes marginalizadas (Marginalized Networks, MNs) y GPs inter-dominio (Inter-Domain GPs, IDGPs). Las MNs pueden verse como modelos que se encuentran a mitad de camino entre las redes neuronales clásicas (Neural Networks, NNs) y los GPs completos, intentando combinar las ventajas de ambos. Aunque la fase de entrenamiento de una MN es diferente, cuando se utiliza para predicción mantiene la estructura de una NN clásica, de manera que las MNs pueden ser interpretadas como una manera novedosa de entrenar NNs clásicas, al tiempo que se añaden beneficios adicionales, como resistencia al sobreajuste y "barras de error"dependientes de la entrada. Los IDGPs generalizan el SPGP, permitiendo a las "pseudo-entradas"residir en un dominio diferente del de entrada, incrementado así la flexibilidad y el rendimiento. Además, proporcionan un marco probabilístico adecuado para entender modelos dispersos anteriores Así pues, todos los algoritmos propuestos son puestos a prueba y comparados con el SPGP sobre varios conjuntos de datos estándar de diferentes propiedades y de gran tamaño. Se intentan identificar además las fortalezas y debilidades de cada uno de los métodos, de manera que sea más sencillo elegir el mejor candidato para cada aplicación potencial

    Query Training: Learning a Worse Model to Infer Better Marginals in Undirected Graphical Models with Hidden Variables

    Full text link
    Probabilistic graphical models (PGMs) provide a compact representation of knowledge that can be queried in a flexible way: after learning the parameters of a graphical model once, new probabilistic queries can be answered at test time without retraining. However, when using undirected PGMS with hidden variables, two sources of error typically compound in all but the simplest models (a) learning error (both computing the partition function and integrating out the hidden variables is intractable); and (b) prediction error (exact inference is also intractable). Here we introduce query training (QT), a mechanism to learn a PGM that is optimized for the approximate inference algorithm that will be paired with it. The resulting PGM is a worse model of the data (as measured by the likelihood), but it is tuned to produce better marginals for a given inference algorithm. Unlike prior works, our approach preserves the querying flexibility of the original PGM: at test time, we can estimate the marginal of any variable given any partial evidence. We demonstrate experimentally that QT can be used to learn a challenging 8-connected grid Markov random field with hidden variables and that it consistently outperforms the state-of-the-art AdVIL when tested on three undirected models across multiple datasets

    Support vector machines with constraints for sparsity in the primal parameters

    Get PDF
    This paper introduces 1 a new support vector machine (SVM) formulation to obtain sparse solutions in the primal SVM parameters, providing a new method for feature selection based on SVMs. This new approach includes additional constraints to the classical ones that drop the weights associated to those features that are likely to be irrelevant. A !-SVM formulation has been used, where ! indicates the fraction of features to be considered. This paper presents two versions of the proposed sparse classifier, a 2-norm SVM and a 1-norm SVM, the latter having a reduced computational burden with respect to the first one. Additionally, an explanation is provided about how the presented approach can be readily extended to multiclass classification or to problems where groups of features, rather than isolated features, need to be selected. The algorithms have been tested in a variety of synthetic and real data sets and they have been compared against other state of the art SVM-based linear feature selection methods, such as 1-norm SVMand doubly regularized SVM. The results show the good feature selection ability of the approaches.This work was supported in part by the Ministry of Science and Innovation (Spanish Goverment), under Grant TEC2008-02473Publicad

    3D Neural Embedding Likelihood for Robust Probabilistic Inverse Graphics

    Full text link
    The ability to perceive and understand 3D scenes is crucial for many applications in computer vision and robotics. Inverse graphics is an appealing approach to 3D scene understanding that aims to infer the 3D scene structure from 2D images. In this paper, we introduce probabilistic modeling to the inverse graphics framework to quantify uncertainty and achieve robustness in 6D pose estimation tasks. Specifically, we propose 3D Neural Embedding Likelihood (3DNEL) as a unified probabilistic model over RGB-D images, and develop efficient inference procedures on 3D scene descriptions. 3DNEL effectively combines learned neural embeddings from RGB with depth information to improve robustness in sim-to-real 6D object pose estimation from RGB-D images. Performance on the YCB-Video dataset is on par with state-of-the-art yet is much more robust in challenging regimes. In contrast to discriminative approaches, 3DNEL's probabilistic generative formulation jointly models multi-object scenes, quantifies uncertainty in a principled way, and handles object pose tracking under heavy occlusion. Finally, 3DNEL provides a principled framework for incorporating prior knowledge about the scene and objects, which allows natural extension to additional tasks like camera pose tracking from video

    Graph schemas as abstractions for transfer learning, inference, and planning

    Full text link
    We propose schemas as a model for abstractions that can be used for rapid transfer learning, inference, and planning. Common structured representations of concepts and behaviors -- schemas -- have been proposed as a powerful way to encode abstractions. Latent graph learning is emerging as a new computational model of the hippocampus to explain map learning and transitive inference. We build on this work to show that learned latent graphs in these models have a slot structure -- schemas -- that allow for quick knowledge transfer across environments. In a new environment, an agent can rapidly learn new bindings between the sensory stream to multiple latent schemas and select the best fitting one to guide behavior. To evaluate these graph schemas, we use two previously published challenging tasks: the memory & planning game and one-shot StreetLearn, that are designed to test rapid task solving in novel environments. Graph schemas can be learned in far fewer episodes than previous baselines, and can model and plan in a few steps in novel variations of these tasks. We further demonstrate learning, matching, and reusing graph schemas in navigation tasks in more challenging environments with aliased observations and size variations, and show how different schemas can be composed to model larger 2D and 3D environments.Comment: 12 pages, 5 figures in main paper, 12 pages and 8 figures in appendi
    corecore